ISIS: A New Approach for Efficient Similarity Search in Sparse Databases
نویسندگان
چکیده
High-dimensional sparse data is prevalent in many real-life applications. In this paper, we propose a novel index structure for accelerating similarity search in high-dimensional sparse databases, named ISIS, which stands for Indexing Sparse databases using Inverted fileS. ISIS clusters a dataset and converts the original high-dimensional space into a new space where each dimension represents a cluster; furthermore, the key values in the new space are used by Inverted-files indexes. We also propose an extension of ISIS, named ISIS, which partitions the data space into lower dimensional subspaces and clusters the data within each subspace. Extensive experimental study demonstrates the superiority of our approaches in high-dimensional sparse databases.
منابع مشابه
A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملReview of ranked-based and unranked-based metrics for determining the effectiveness of search engines
Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...
متن کاملThe GC-tree: a high-dimensional index structure for similarity search in image databases
With the proliferation of multimedia data, there is an increasing need to support the indexing and retrieval of high-dimensional image data. In this paper, we propose a new dynamic index structure called the GC-tree (or the grid cell tree) for efficient similarity search in image databases. The GC-tree is based on a special subspace partitioning strategy which is optimized for a clustered high-...
متن کاملSimilarity Search in 3D Protein Databases
1 Introduction We introduce a new approach for similarity search in 3-D protein databases. By using histograms, we define adaptable similarity models that address the 3-D shape as well as chemical properties of proteins. Quadratic forms are employed as similarity distance functions for which efficient query processing algorithms are available [Sei 97]. Experimental examples illustrate the appli...
متن کاملA Tabu Search Method for a New Bi-Objective Open Shop Scheduling Problem by a Fuzzy Multi-Objective Decision Making Approach (RESEARCH NOTE)
This paper proposes a novel, bi-objective mixed-integer mathematical programming for an open shop scheduling problem (OSSP) that minimizes the mean tardiness and the mean completion time. To obtain the efficient (Pareto-optimal) solutions, a fuzzy multi-objective decision making (fuzzy MODM) approach is applied. By the use of this approach, the related auxiliary single objective formulation can...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010